arXiv:1504.03381vl [stat.ME] 13 Apr 2015 


arXiv: 0000.0000 


Convex Combination 
of Ordinary Least Squares and 
Two-stage Least Squares Estimators 

Cedric E. Ginestet^*, Richard Emsley^, and Sabine Landau^ 

^ Biostatistics Department, Institute of Psychiatry, 

Psychology and Neuroscience, King’s College London 

^ Centre for Biostatistics, Institute of Population Health, University of Manchester 

Abstract: In the presence of confounders, the ordinary least squares 
(OLS) estimator is known to be biased. This problem can be remedied by 
using the two-stage least squares (TSLS) estimator, based on the availabil¬ 
ity of valid instrumental variables (IVs). This reduction in bias, however, is 
offset by an increase in variance. Under standard assumptions, the OLS has 
indeed a larger bias than the TSLS estimator; and moreover, one can prove 
that the sample variance of the OLS estimator is no greater than the one 
of the TSLS. Therefore, it is natural to ask whether one could combine the 
desirable properties of the OLS and TSLS estimators. Such a trade-off can 
be achieved through a convex combination of these two estimators, thereby 
producing our proposed convex least squares (CLS) estimator. The relative 
contribution of the OLS and TSLS estimators is here chosen to minimize 
a sample estimate of the mean squared error (MSE) of their convex com¬ 
bination. This proportion parameter is proved to be unique, whenever the 
OLS and TSLS differ in MSEs. Remarkably, we show that this proportion 
parameter can be estimated from the data, and that the resulting CLS 
estimator is consistent. We also show how the CLS framework can incor¬ 
porate other asymptotically unbiased estimators, such as the jackknife IV 
estimator (JIVE). The finite-sample properties of the CLS estimator are in¬ 
vestigated using Monte Carlo simulations, in which we independently vary 
the amount of confounding and the strength of the instrument. Overall, 
the CLS estimator is found to outperform the TSLS estimator in terms of 
MSE. The method is also applied to a classic data set from econometrics, 
which models the financial return to education. 

AMS 2000 subject classifications: Convex combination. Instrumental 
variables, Ordinary least squares. Econometrics, Two-stage least squares. 


1. Introduction 

Instrumental variables (IVs) estimation is one of the cornerstones of modern 
econometric theory. The use of IVs has been described as “only second to or¬ 
dinary least squares (OLS) in terms of methods used in empirical economic 
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research” (Wooldridge, 2002, p.89). This ranking of estimation techniques natu¬ 
rally leads to the following methodological questions: When should we prefer IV 
estimation over OLS? Is it always preferable to use an instrument even though 
this may substantially increase the variance of the resulting estimator? 

In fields including econometrics and the social sciences, and in some medi¬ 
cal disciplines such as psychiatry, the direct randomized allocation of subjects 
to different experimental conditions is rarely possible, thereby preventing such 
scientists from inferring causal relations. Without adequate experimental ma¬ 
nipulation, the model’s predictors may be correlated with the errors. When this 
is case, we say that the predictors are endogenous. The absence of experimental 
manipulation in observational data, however, can be addressed by using IVs to 
predict the alleged causal variables. In particular, the resulting IV estimators 
allow to reduce the bias of the estimated effect. The main difficulty in conduct¬ 
ing such IV analyses lies in the choice of appropriate exogenous instruments. 
Indeed, instruments are assumed to be solely correlated with the outcome vari¬ 
able through the predictor. This specific assumption is sometimes referred to as 
the exclusion criterion, since it disallows any direct effect of the instrument on 
the outcome. 

The first published use of IVs is commonly attributed to Wright (1928) in 
the context of microeconometrics, albeit this has been historically disputed (see 
Stock and Trebbi, 2003). This estimation technique has been widely adopted 
in econometrics, and in other social sciences, including psychology, epidemiol¬ 
ogy, public health and political science. In particular, the use of IV methods 
has now become an integral part of causal inference (Pearl, 2009). The use of 
IVs in regression has been extended in several directions, allowing two-sample 
estimation, for instance (Inoue and Solon, 2010), and the selection of instru¬ 
ments using penalized methods such as the LASSO (Ng and Bai, 2009, Belloni 
et ah, 2012). More recently, these methods have become especially popular in 
the study of genetic variants, thereby demonstrating the wide applicability of 
IV-based methods (Palmer et ah, 2012, Pierce and Burgess, 2013). The reader 
may consult Wooldridge (2002) and Cameron and Trivedi (2005) for an intro¬ 
duction to the use of instrumental variables in the context of econometrics. A 
review of the assumptions underlying the use of IVs is provided by Angrist and 
Krueger (2001), and Heckman (1997); whereas the finite-sample properties of 
IV estimators have been described by Maddala and Jeong (1992) and Nelson 
and Startz (1990). 

While the asymptotic properties of IV estimators such as the two-stage least 
squares (TSLS) are well-understood (Staiger and Stock, 1997, Hahn et ah, 2004); 
in practice, it is not always clear whether or not using an IV estimator over a 
simpler OLS estimator is necessarily beneficial. Intuitively, since every IV is a 
random variable, its inclusion in the analysis tends to increase the variance of the 
resulting estimator. The magnitude of this increase in variance is proportional to 
the correlation of the instrument with the predictor. Poor or weak instruments 
are variables that are weakly correlated with the endogenous variables in the 
model. Thus, although the use of an IV estimator is likely to lead to a significant 
decrease in the bias of the OLS estimator, it will also yield a more variable 
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estimator. Since the true value of the parameters of interest is unknown in 
practice, it is generally not possible to evaluate whether the benefit of using a 
given set of instruments outweighs the cost in variance of incorporating them 
into the analysis. In addition, the use of weak instruments can also lead to a 
substantial amount of finite-sample bias. Indeed, the use of weak instruments 
has been studied by Bound et al. (1995), and these authors have shown that the 
inclusion of instruments with only weak linear relationships with the endogenous 
variables, tends to inflate the bias of the IV estimator; ultimately yielding an 
estimator as biased as the original OLS estimator. 

In this paper, we address this issue by proposing a sample estimate of the 
mean squared error (MSE) of the estimators of interest. Since the MSE can 
be decomposed into a bias and a variance component, it provides us with a 
natural criterion for combining the OLS and TSLS estimators. Crucially, how¬ 
ever, the proportion parameter weighting the relative contributions of the two 
candidate estimators is adaptive, in the sense that it depends on the properties 
of the data, and takes into account the strength of the instruments. The idea 
of combining the OLS and TSLS estimators has been previously discussed in 
the literature (Angrist et ah, 1995). In particular, Sawa (1973) has proposed an 
“almost unbiased estimator” for simultaneous equations systems, which strikes 
a balance between two different fc-class estimators by weighting their relative 
contributions using the sample size and the number of variables in the model. 
Moreover, Angrist et al. (1995) have given an interpretation of the limited in¬ 
formation maximum likelihood (LIML) estimator as a combination estimator, 
which relies on a weighting of the OLS and TSLS estimators. Such combined 
estimators, however, do not attempt to estimate the respective contributions of 
each estimator using the data, as we have done in the paper at hand. The main 
contribution of this article is therefore to provide a framework for estimating 
such proportions in a data-informed adaptive manner. 

The paper is organized as follows. In section 2, we fix the notation, and 
briefly recall the assumptions behind OLS and TSLS estimation. We then show 
that these two estimators have complementary properties, in the sense that 
the OLS has minimal variance, while the TSLS is asymptotically unbiased. In 
section 2.4, we describe our proposed convex estimator, and study its asymptotic 
properties, under the assumption that the optimal proportion is known; whereas 
in section 2.5, we describe a sample estimator of this proportion parameter. This 
framework is then extended to other asymptotically unbiased estimators in a 
third section. In section 4, these theoretical results are tested using a range of 
different synthetic data sets. The proposed methods are also applied to a classic 
data set from econometrics in section 5, and some conclusions are provided in 
section 6. Finally, the proofs of all the propositions in the paper are reported in 
the appendix. 
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2. Combining OLS and TSLS Estimators 

2.1. Ordinary Least Squares (OLS) 

The model under scrutiny is described by the following linear relationship, 

Y = X(3 + e, (1) 

where X is a random row vector of order 1 x fc, and /3 is a column vector of 
order fc x 1 representing the parameters of interest, while Y and e are two real¬ 
valued random variables. Throughout, we will treat both the error term, e, and 
the vector of predictors, X, as random quantities, thereby allowing for possible 
correlations between the Xj^s and e. For expediency, all random variables, re¬ 
gardless of their dimensions, are simply denoted by upper-case Roman letters. 
In general, a sample of n draws will be available from the model in equation 
(1), such that 

= Xi/3-|-£i, Vz=l,...,n; 

where is again a row vector of order 1 x /c. This may also be written using 
matrix notation as 

y = x/3 + £, 

where y and e are column vectors of order n x 1, and X is a matrix of order 
n X k. 

The estimation of the unknown vector of parameters, /3, can be performed by 
making some standard assumptions about the moments of the different random 
variables in (1), as commonly done in econometrics (see Wooldridge, 2002): 

(Al) Exogeneity: E[A'e] = 0, 

(A2) Homoscedastitity: E[£^|A] = 

(A3) Identification: rank(E[A'A]) = k; 

with cr^ := E[e^], and where E[A'A] represents a matrix of order k x k. Un¬ 
der assumptions (A2) and (A3), the OLS estimator behaves asymptotically as 
follows, 

X ■■= (X'X)-iX'y ^ E[A'A]-iE[A'y] =: 3- (2) 

If assumption (AI) also holds, we say that the model in (1) is exogenous, and it 
then follows that the OLS estimator is asymptotically unbiased and consistent. 
That is, the limit, (3, can be shown to be equal to the true parameter, (3. 
However, if assumption (Al) is violated, then the OLS estimator is inconsistent. 
Thus, a model in which the vector of predictors has non-zero correlations with 
the error term, e, is referred to as an endogenous model. 

2.2. Two-stage Least Squares (TSLS) 

The limitations of the OLS can be addressed by using a vector of IVs, denoted 
Z. We will here assume that Z is a random row vector of order lx I, with I > k. 
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Fig 1. Graphical representation of the IV model described in equations (1) and (3) 
in the presence of an nnmeasnred confounder U ; where observed and latent variables 
are denoted by squares and circles, respectively. This graph corresponds to a two-level 
system of equations composed of V — Xf3 + Ua + e, and X — ZT + Ua + 5. When we 
assume that a 7 ^ 0, condition (Al) is violated, and X becomes endogenous. 


This vector of instruments is used in a multivariate linear equation of the form, 

X = ZT + S, (3) 

where F is an unknown matrix of parameters of order I x k, and X and S are 
random row vectors of order 1 x k. As before, we will usually work with a set 
of n realizations from this multivariate linear model expressed as follows, 

Xi — ZiF + Si, V i = 1,... ,n; (4) 

where and Si are 1 x fc row vectors, and is an 1 x / row vector. This can 
be concisely expressed using matrix notation as 

x = zr + D, 

where Z and D are matrices of order n x I and n x k, respectively. A graphical 
illustration of the IV model is provided in figure 1. 

When using the two-stage least squares (TSLS) estimator, we will make the 
following additional assumptions about the random row vector of instruments: 

(A4) Exogeneity: E[Z'e] = 0, 

(A5) Homoscedastitity: E[£:^|Z] = 

(A6) Identification: rank(E[Z'Z]) = I, rank(E[ZW]) = k; 

where, as before, cr^ := E[£^]. The TSLS estimator is then defined as 

:= (X'X)-iX'y, 

with X := H^X denoting the projection of the matrix of predictors onto the 
column space of Z, and where H^, := Z(Z'Z)^^Z' is the hat matrix of the 
multivariate regression in equation (4). Under assumptions (A5) and (A6), the 
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TSLS estimator converges in probability to a non-stochastic vector, /3, defined 
as _ 

13 := {E[X'Z]E[Z'Z]-'^E[Z'X]y^(E[X'Z]E[Z'Z]-'^E[ZY]), (5) 

such that f3n /3, as described in Wooldridge (2002). Moreover, under as¬ 
sumption (A4), this sequence of estimators can be shown to be asymptotically 
unbiased and consistent with respect to the true vector of parameters, such that 
(3 = (3. However, this gain in unbiasedness is compensated by a larger variance 
of the TSLS estimator, as we discuss in the next section. 

2.3. Bias/Variance Trade-off 

Under assumptions (A2-A6), the TSLS estimator is asymptotically unbiased. By 
contrast, if assumption (Al) does not hold, then the OLS estimator is asymptot¬ 
ically biased. However, for finite n, the empirical variance of the TSLS estimator 
can be shown to be larger than the one of the OLS estimator. We make these 
observations formal by comparing the variance estimators of the OLS and TSLS 
estimators. These are 

^r(,3„) :=a2(X'X)-i, and ^r(;3„) := d2(x'X)-i; (6) 

with the sample residual sums of squares (RSSs), and being given by 

^ n 1 ^ 

■■= -r y'(2/* - and al := -- V(y, - (7) 

Tl — fC Ti — K 

t—1 i—l 

for the OLS and TSLS estimators, respectively. 

More remarkably, one can also approximate the bias of these two estimators. 
The theoretical squared bias of a given arbitrary estimator, /3l, is defined as 

Bias2(/3t) := (E[^t ] - /3)(E[/3t] - (3)', 

for every n. In the sequel, we will assume that the IVs under scrutiny are valid 
instruments, such that assumption (A4) is true. Therefore, it follows that the 
TSLS estimator, /3„, is known to be consistent, and can be used to construct a 
consistent approximation of the bias of any arbitrary estimator, . For large 
n, it follows that the squared bias of any such estimator can be consistently 
estimated by 

Bhrs2(/3t) := (/3t _ (8) 

Observe that this empirical estimate of the bias gives a value of zero for the TSLS 
estimator, for every n. This particular choice of empirical bias estimate can also 
be seen to be related to the Hausman test, commonly used in econometrics for 
testing whether or not the predictors of interest are exogenous (Hausman, 1978). 
Indeed, the squared bias in equation (8) corresponds to the numerator of the 
Hausman test statistic. 
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Combining this empirical estimate of the bias with the standard variance 
estimators in equation (6), we can formalize our original observation about the 
trade-off between the superiority of the TSLS estimator in terms of bias, and the 
superiority of the OLS estimator in terms of variance. This result will motivate 
our construction of a combined estimator, in which we will exploit the respective 
strengths of the OLS and TSLS estimators, denoted by and /3„, respectively. 

Proposition 1. Under assumptions (A2-A6), for every n, and for every real¬ 
izations, y, X, and Z, if both X'X and X'X are invertible, then 

(i) ]Bias^(/3„) ^ ]Bias^(/3„), 

(a) Var(3„) ^ Var(^„); 

where >z and ^ denote the positive semidefinite order for k x k matrices. 

Note that, in proposition 1, we have requested both X^X and X'X to be 
invertible. Indeed, while assumptions (A3) and (A6) ensures that the stochastic 
limits of these two matrices are invertible, this does not guarantee that these 
matrices will be invertible for every n. Although these inequalities appear to 
be well-known, they do not appear to have been formally proved in standard 
texts on instrumental variables (see, for instance Wooldridge, 2002, Davidson 
and MacKinnon, 1993, Cameron and Trivedi, 2005). A full proof of this result 
is therefore provided in the appendix. 

Furthermore, the two statements in proposition 1 can also be shown to hold 
in the stochastic limit, as described in the following corollary. Note that this 
corollary is trivially true for the variances of the OLS and TSLS estimators, 
since both of these quantities converge to a zero matrix. A proof of this result 
is provided in the appendix. 

Corollary 1. Under assumptions (A2-A6), 

(i) plim„]Bias^(/3„) ^ plim„ ]Bias^(/3„), 

(a) plim„Var(3n) ^ plim„ Var(^„); 

where, as before, f3n and /3„ denote the OLS and TSLS estimators, respectively. 

The inequalities in proposition 1 indicate that it may be fruitful to compare 
the MSEs of these two estimators for finite n. Clearly, since the bias tends 
to dominate the MSE asymptotically, it follows that the TSLS should exhibit 
a smaller level of bias as n goes to infinity. Nonetheless, for finite samples, 
the OLS may yield a smaller MSE than its two-stage counterpart, due to its 
greater efficiency. Therefore, one may try to strike a balance between the relative 
strengths of these two types of estimators, using the sample MSE as a criterion. 

2.4- Convex Least Squares (CLS) 

In this section and in the rest of this paper, we now assume that (A2-A6) hold. 
In addition, we also assume that the random vectors, /3„ and /3„, are well- 
behaved, in the sense that they are elementwise squared-integrable for every 
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n. Under these assumptions, we propose an estimator, denoted /3„(7r), which is 
defined as a convex combination of the OLS and TSLS estimators, such that 

/3„(7r) := TT^n + (1 - 7r)^„, (9) 

for every tt e [0,1]. The proportion parameter, tt, controls the respective contri¬ 
butions of the OLS and TSLS estimators. This parameter is selected in order to 
minimize the trace of the theoretical MSE of the corresponding CLS estimator, 

MSE(3„(7r)) = Ep„(7r) - /3)(/9„(7r) - /3)'], 

where /3 G is the true parameter of interest and the MSE is a k x k matrix. 

The MSE automatically strikes a trade-off between the unbiasedness of the 
TSLS estimator and the efficiency of the OLS estimator. Indeed, this criterion 
can be decomposed into a variance and a bias component, such that 

MSE(/3„(7r)) = Var(/3„(7r)) -bBias^O„(7r)). 

Therefore, in the light of proposition 1, this criterion constitutes a natural choice 
for combining these two types of estimators. 

The MSE of the CLS estimator, MSE(7r^„ -I- (1 — 7r)/3„), can be expressed as 
the weighted sum of the MSEs of the OLS and TSLS estimators, as well as a 
cross-squared-error (CSE) term between these two estimators, 

TT^ MSE(3„) + 27r(l - tt) CSE(3„, + (1 - irf MSE(/3„), (10) 

where the cross-term is defined as follows, 

CSE(3„,^„) :=E[(3„-/3)(^„-/3)']. 

By analogy with the MSE, we can also decompose the CSE into a covariance 
term and a squared cross-bias term, denoted ]Bias^(,9„,/3„), such that 

CSE(/3„,,3„) = Cov(/ 3„,,5„) -bBias^(3„,/3„), 

where the squared cross-bias term is Bias^(/3„,/3„) := (E[/3„] — /3)(E[/3„] — (3)'. 

The true (or theoretical) proportion parameter, tt, is defined as the value 
that minimizes the trace of the theoretical MSE of the CLS estimator. Note 
that we are here considering a sequence of parameters, 7r„, since this definition 
may yield a different proportion for different sample sizes. Therefore, for every 
n, the target proportion parameter is given by 

7r„ := argmintrMSE(;3„(7r)). (11) 

7 i-e[o,i] 

Crucially, this parameter is available in closed-form, and it can also be shown to 
be unique, since the trace of the theoretical MSE of is a convex function of 
TT. This statement is made formal in the following proposition, which is proved 
using the aforementioned decomposition of the MSE of the CLS estimator. The 
proportion parameter is only non-unique when the square-root of the trace of 
the MSEs the OLS and TSLS estimators are identical. This quantity, denoted 
by (trMSE(^l))^/^ for every estimator /3l, will be referred to as the RMSE of 
/3L in the sequel. See appendix A for a proof of this minimization. 



Proposition 2. For every n, the proportion parameter defined in equation (11) 
is given by 

tr^MSE(g„) - CSE(g„ J„)) 
tr(MSE(/3„) - 2 CSE(3„, /3„) + MSE(3„))' 

It is unique whenever the RMSEs of the OLS and TSLS estimators are not 
equal. 

Finally, we can verify that the CLS estimator based on the true proportion 
7r„ has an MSE, which is lower or equal to the MSEs of the OLS and TSLS 
estimators. Note that this inequality is not immediate from our definition of 7r„, 
as we need to control for the additional CSE term in equation 10. A proof of 
this proposition is also provided in the appendix. 

Proposition 3. The CLS estimator based on the true proportion, 7r„, satisfies 
trMSE(^„(7r„)) < trmin{MSE(/3„),MSE(,9„)}, 


for every n. 

Observe that this result holds in greater generality, since the OLS and TSLS 
estimators could be replaced by other candidate estimators. In the next section, 
we describe how to estimate the proportion parameter in an adaptive manner 
for this particular choice of estimators; whereas in section 3, we consider how 
the CLS can accommodate other estimators. 

2.5. CLS Estimation 

When evaluating 7r„ from a particular data set, we estimate this parameter by 
minimizing the trace of an empirical estimate of the theoretical MSE of the 
CLS estimator. A consistent estimator of the MSE can be (Atained by setting 
the true parameter, /3, to be equal to the TSLS estimator, /3„. Thus, for every 
TT € [0,1], our proposed empirical MSE is given by 

MSEO„(7r)) = VarO„(7r)) + Bias^(/3„(7r)), (12) 

where Bias(/3„(7r)) := /3„(7r) — fSn- That is, we here use the TSLS estimator as 
a consistent estimator of the true parameter, (3. To approximate the popula¬ 
tion variance of the CLS estimator, we can use a combination of the empirical 
estimates of the variances of the two estimators of interest, such that 

Var(/3„(7r)) = 7r^Var(/3„) -|- 27r(l - 7r)Cov(/3„,^„) -b (1 - 7r)^Var(/3„), (13) 

where the empirical variances of the OLS and TSLS estimators have already be 
given in equation (6); and where the covariance term takes the following form, 

^viXJu) ■■= d2(X'X)-i(X'X)(X'X)-i = d^(X'X)-i; 
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with as before, X := H^X, and in which the second equality is obtained by 
using the idempotency of H^. Moreover, the cross-RSS, denoted is given by 

1 ” 

■= -r - ^tf3n)iyi - y^tf3n), 

Tl — fx 

t—1 

which can be compared to the RSSs of the OLS and TSLS estimators in equation 

(7). 

The second term in equation (12) consists of the empirical bias of the CLS 
estimator. As for the empirical variance, the bias can be estimated by using the 
TSLS estimator to replace the unknown true parameter, such that 

]Bias^(^„(7r)) = 7r^]Bias^(/3„) + 27r(l — 7r)]Bias^(/3„,,9„) -b (1 — 7r)^]Bias^(/3„), 

where the empirical biases of the OLS and TSLS estimators are estimated as 
in equation (12). Since we have set the bias of /3„ to zero, it follows that the 
cross-bias term also eliminates. Thus, the empirical bias of the CLS estimator 
becomes proportional to the one of the OLS estimator, such that we obtain 

]Bias^(/3„(7r)) = 7r^]Bias^(,9„). (14) 

The empirical estimate of the MSE in equation (12) can be shown to be consis¬ 
tent, as described in the following proposition, which is proved in the appendix. 
Observe that this statement holds for every arbitrary proportion comprised be¬ 
tween 0 and 1. 

Proposition 4. For every n, /9„(7r) /9(7r) := tt(3 -b (1 — where f3 and 

(3 are defined as in equations (2) and (5), respectively. Moreover, 

M^(/9„(7r)) MSE(/9(7r)). 

As for the true proportion parameter, 7r„, which minimizes the trace of the 
theoretical MSE, the proportion estimator, 7f„, which minimizes the trace of the 
empirical MSE; is also available in closed-form. We thus obtain the following 
result, as a corollary to proposition 2. Observe that, since the bias of the TSLS 
estimator is zero under our estimation framework, it follows that the MSE of 
the TSLS reduces to the variance of that estimator, and that the CSE term 
reduces to the covariance of the two estimators of interest. 

Corollary 2. The estimator of the proportion parameter, tt G [0,1], defined as 
7f„ := argmintrMSE(/3„(7r)), in which the MSE is defined as in equation (12); 
satisfies, 

^ _ tr(Var(^„) - Coy{^n,^n)) _ 

tr(Var(^„) - 2Cov(3n,/5n) + MSE(/3„)) 

We have here emphasized the estimation of the proportion parameter. Our 
original motivation, however, for constructing the CLS estimator centered on 
producing an estimator, which would minimize the MSE. This can be achieved 
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by estimating the CLS estimator, /3„(7r), at the value of the estimated propor¬ 
tion, 7f„, thereby producing /3„(^„). We thus conclude this section by verifying 
that this particular CLS estimator behaves as expected asymptotically, in the 
sense that it is both weakly and MSE consistent. 

Proposition 5. Under assumptions (A2-A6), the CLS estimator, f3n{^n)> sat¬ 
isfies (i) ^„(7f„) 13, and (ii) ^„(7f„) (3. 

The proof of this proposition follows from the inequalities reported in propo¬ 
sition 3, combined with the fact that the TSLS estimator is both weakly and 
MSE consistent. See appendix A for details. Observe that the CLS framework 
relies on the existence of the first two moments of the TSLS estimator. For finite 
n, Kinal (1980) has shown that the TSLS estimator only possesses first and sec¬ 
ond moments when I > k 2. Asymptotically, however, such moments always 
exist. As for the TSLS therefore, we are thus considering an estimator, which 
is solely asymptotically well-identified. This particular issue is further discussed 
in section 6. 

2.6. Bootstrap CLS Variance 

We now turn to the question of estimating the variance of our proposed CLS 
estimator. In equation (13), we have described the variance of ,9„(7r), for every tt. 
This quantity was then used in our proposed empirical MSE, in order to obtain 
a sample estimate of tt. However, the variance formula in equation (13) does not 
take into account the variability associated with the choice of 7r„. The derivation 
of a closed-form estimator for the variance of /9„(7f„) is beyond the scope of this 
paper. However, in practice, the variance of the CLS estimator can be computed 
using the bootstrap by sampling with replacement from the triple (y, X, Z), and 
producing B bootstrap samples denoted (y^, XJ, Z^), with b = 1,... ,B. We are 
here adopting the framework described by previous researchers, who have also 
used the bootstrap in the context of IV estimation (see for example Wong, 1996). 

Specifically, each bootstrap sample is constructed by sampling n cases with 
replacement from the collection of triples {yi,Xi,Zi), with i = 1,... ,n. These 
bootstrap samples are then used to produce the bootstrap distribution of the 
CLS estimator. That is, for each of these bootstrap samples, we compute the 
CLS estimator, f3nb ■= I3nbi^nb), which leads to the following bootstrap variance 
estimator, 

— 1 ^ 

Var*(3„) := - E*[^„])0:, - E*[3„])', 

6=1 

where E*[/9„] := '^b^nb/^ denotes the bootstrap mean. In our real-world data 
set application in section 5, we will report the variance of the CLS and its 
confidence interval using the bootstrap. 

One of the limitations of our discussion thus far is the presence of a finite- 
sample bias in the TSLS estimator. In the next section, we consider other con¬ 
sistent estimators, which could be articulated within our framework by being 
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substituted to the TSLS estimator. Indeed, every asymptotically unbiased esti¬ 
mator could be used to replace the TSLS estimator in the previous results. 

3. Extensions to Other Unbiased Estimators 

Our proposed convex combination of least squares estimators essentially relies 
on the choice of an asymptotically unbiased estimator. Under standard assump¬ 
tions on the properties of the IVs under scrutiny, the TSLS estimator satisfies 
this criterion. This choice was mainly motivated by computational considera¬ 
tions. The empirical variance for the TSLS estimator is indeed well-known and 
can easily be manipulated. We now extend the CLS framework, in order to ac¬ 
commodate other asymptotically unbiased estimators. The corresponding MSE 
can be empirically estimated using the bootstrap at a greater computational 
cost, but without additional theoretical complications. The resulting estimator 
will thus be referred to as the bootstrap CLS. 

3.1. Jackknife IV Estimator 

An ideal replacement for the TSLS estimator is the jackknife IV estimator 
(JIVE), which we now describe. This estimator was originally introduced by 
Angrist et al. (1995) in order to reduce the finite-sample bias of the TSLS 
estimator, when applied to a large number of instruments. Indeed, the TSLS 
estimator tends to behave poorly as the number of instruments increases. We 
briefly outline this method in the present section. See Angrist et al. (1999) for 
an exhaustive description. Let the estimator of the regression parameter in the 
first-level equation in model (3) be denoted by 

f :=(Z'Z)-i(Z'X), 

which is of order I x k. The matrix of predictors, X, projected onto the column 
space of the instruments is then given by X = ZE. The jackknife IV estimator 
(JIVE) proceeds by estimating each row of X without using the corresponding 
data point. That is, the row in the jackknife matrix, Xj, is estimated without 
using the i**' row of X. 

This is conducted as follows. Eor every t = 1,..., n, we first compute 

:=(Z'(,)Z(q)-i(Z'(,)X(q), 

where Zq) and X(q denote matrices Z and X after removal of the z**' row, such 
that these two matrices are of order {n — 1) x I and (n — 1) x k, respectively. 
Then, the matrix Xj is constructed by stacking these jackknife estimates of E, 
after they have been pre-multiplied by the corresponding rows of Z, 

Zlf(l) 

X,7 := : 

ZriE(„) 
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where each is an l-dimensional row vector. The JIVE estimator is then ob¬ 
tained by replacing X with Xj in the standard formula of the TSLS, such that 


I3j := (X/X)-i(X/y). 

In this paper, we have additionally made use of the computational formula 
suggested by Angrist et al. (1999), in which each row of Xj is calculated using 


ZiT {o — 


z,T — hj-x,- 


where Zir(i), z^F and are fc-dimensional row vectors; and with hi denoting 
the leverage of the corresponding data point in the first-level equation of our 
model, such that each hi is defined as Zi(Z'Z)~^z'. 


3.2. Bootstrap CLS Estimation 

When replacing the TSLS estimator with an arbitrary estimator, such as the 
JIVE, some of the quantities required for estimating the proportion, 7r„, need not 
be available in closed-form. However, such quantities can be straightforwardly 
estimated using the bootstrap, as was done for the variance of the CLS estimator 
in section 2.6. 

We can indeed approximate the unknown joint distribution, F{Y, X, Z), 
with its bootstrap estimate, F*, using the straightforward sampling scheme 
described in section 2.6. As before, we thus generate B bootstrap samples, de¬ 
noted (y^,Xj, Zl), from F*. These bootstrap samples are then used to produce 
the bootstrap distributions of the OLS estimator and the unbiased estimator of 
interest such as the JIVE; and the first and second moments of these estimators 
are computed. Thus, for every unbiased estimator, /3l, and given the OLS esti¬ 
mator, /3„, we construct a bootstrap estimate of the MSE of the corresponding 
CLS estimator, such that for every tt, we define 

MSE*0„(^)) := E* [(Mn) - E* [/3t ])0„(7r) - E*[/3t])'' . 

As in section 2.6, the operator, E*, denotes the expectation over the bootstrap 
estimate of F. Similarly to the MSE decomposition in equation (10), the boot¬ 
strap estimate of the MSE can be decomposed into the following components, 

^2m^*( 3„) + 2^(1 - 7r)CSE*(,3„,/3),) + (1 - 

where the bootstrap estimate of the MSE of f3'l was reduced to Var*(/3l), since 
the estimator, f3j^, is assumed to be unbiased for every n. Moreover, as in equa¬ 
tion (14), the bootstrap estimate of the bias of the CLS estimator is proportional 
to the bootstrap bias of the OLS estimator, such that we have 

]Bias*(/3„(7r)) = 7r^]Bias*(/3„). 
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The boostrap estimate of the proportion, denoted tt*, is then given by a 
formula analogous to the one described in corollary 2 , in which each empir¬ 
ical moment is replaced by its bootstrap equivalent. This allows us to show 
that, for every choice of asymptotically unbiased estimator, /3l, the resulting 
bootstrap CLS estimator, 3 „( 7 f*), achieves minimal bootstrap MSE amongst 
its constituent estimators. A proof of this corollary is provided in the appendix. 
It relies on the same arguments employed in the proof of the optimality of the 
CLS estimator in proposition 3. 

Corollary 3. For every asymptotically unbiased estimator /3l, the bootstrap 
CLS estimator o//3„ and (3\, based on the bootstrap proportion, if*, satisfies 

trM^*( 3 „( 7 f;)) < trmin{MSE*(3„),M^*(/3t)}, 

for every n. 

4. Data Simulations 

We here produce synthetic data sets with different number of instruments. All 
of the models considered in this section are based on a univariate endogenous 
variable, X, without any additional covariate in the second-level equation. In 
Model I, we describe a simple Gaussian model with a single valid instrument; 
whereas in Model II, we consider a similar statistical model comprising I = 10 
uncorrelated instruments. 

4 . 1 . Simulation Model I 

Synthetic data sets were created from the following two-level model. We are 
here focusing on a univariate model composed of a single predictor, X, and a 
single instrument, Z. For every i = 1,..., n, the two levels of the model are 

yi = Xij3 + u^a + Ei, 

c ( 15 ) 

Xi = Zi'y + uia -I- dp, 

where a controls the degree of endogeneity of X, and 7 controls the amount of 
covariance between X and the instrument Z, such that 7 can be interpreted as 
the strength of the instrument. We wish to keep the marginal variances of the 
Yfs and Xfs constant, while varying the values of a and 7 . This is achieved 
by defining the variances of the error terms, Si and 6i, as functions of a and 7 . 
In doing so, we simplify the interpretation of /3, which becomes a standardized 
regression coefficient, whenever 7 = 0. Throughout these simulations, the true 
parameter of interest will be set to be /3 = 1/2. A graphical representation of 
this model has been given in figure 1 . 

The model is thus standardized by setting the marginal variances of the Yfs 
and ATi’s to one, such that Var(yi) = Var(W) = 1; and by generating the Zfs 
and Ui’s from a standard normal distribution, such that 

~ Af(0,l), Vz=l,...,n. 
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The marginal variances of these random variables will be denoted by cr^ := 
Nai[Zi] and cr^ •= Var[[/i], respectively. The two remaining variances can then 
be defined as functions of the different regression parameters. For the second- 
level equation, we have 

~ A^(0, crg(a)), CTg(a) := 3/4 — 2a^, (16) 

which follows from the constraint Var[yi] = 1 , and from the decomposition, 

Var(yi) = Var(Xi) -|- a^Yai{Ui) + 2/3aCov(Xi, Ui) + Var(ei). 

Using the linear independence of Z and [/, the covariance term becomes 
Cov{Xi,Ui) = aa^. Moreover, from our choice of variances for Xi and Ui, we 
also obtain Var)^^) = + 2/3a^ -I- cr/. Fixing the variance of Yi to unity 

and using our choice of /3, this yields the definition of cr/(a) given in equation 
(16). Moreover, observe that the positiveness of cr/ produces an upper bound 
for a, which is given by a < a/3/8. 

Similarly, we can ensure that the marginal variances of the XPs are also 
constant, irrespective of the choice of a and 7 , by controlling the variances of 
the Si’s. Thus, we set 

S, ~ iV( 0 ,cr^(Q;, 7 )), cr^(a, 7 ) := 1 - + a^). 

This specification ensures that the variances of the XPs are constant with 
cr^ = 1. That is, since by assumption Cov(Zi, Ui) = 0, and using the fact that 
for uncorrelated variables, the Bienayme formula states that Var(^^-Xj) = 
J2j '^ar(Xj); it then follows that for every z = 1 ,..., n, we obtain 

Var(Xi) = 7 ^ Var(Zi) -|- Var({7i) -b Var((5i), 

which gives, Var((5i) = 1 — as required. Moreover, note that we 

must have 7 < in order to ensure that cr| > 0. Using our previous 

bound for a, which states that a < y/s/S, it then follows that 7 < y/5/8. 

Altogether, we have therefore fixed the variances of the l/’s, A^’s, UPs, and 
Zi’s to unity; and by assumption, the instrument is deemed valid in the sense 
that Cov{Zi,Ui) = 0. From these standardizations, it follows that for every 
a € [0, y/ 3 / 8 ), and for every 7 G (0, y/1 — Q;^), the correlations of the A^’s with 
the Ui’s and ZPs are controlled by the two simulation parameters, a and 7 : 

Cor(Ai, Ui) = a, and Cor(Ai, Zi) = 7 , 

which respectively represent the magnitude of the confounding and the strength 
of the instrument. In addition, the correlations of the Yfs with the Ufs and the 
Zfs are also controlled by a combination of these parameters. These correlations 
are respectively given by Cor(yi, Ui) = Pa + a, and Cor(yi, Zi) = P^. Finally, 
the correlation between the outcome and the endogenous variable satisfies 

Cor(yi, Ai) = p + a"^. 

Therefore, in the absence of any confounding effect, P can be interpreted as the 
correlation coefficient between the Yfs and the A^’s. 
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Estimators 

^ OLS 
^ TSLS 
5 GLS 


Fig 2. Monte Carlo distributions of the estimators’ values under three different levels of 
confounding, a = Cor{Xi,Ui)', and for three different levels of instrument’s strength, 
7 = CoT{Xi, Zi). In each panel, the sample size varies between n = 100 and n — 
500. We here compare the OLS, TSLS and CLS estimators with respect to the true 
parameter /3 = 1/2, whose value is indicated by a dashed line. These simulations are 
based on 10® iterations for each scenario. The boxplots are here centered at the median, 
and the upper and lower hinges correspond to the hrst and third quartiles. 


4-2. Simulation Model II 

We extend Model I to the case of several valid instruments. For convenience, 
these instruments are assumed to be uncorrelated. The second-level equation is 
taken to be identical to the second-level equation in equation (15). The first-level 
equation, by contrast, now includes a vector of instruments, such that 

Xi = + + Si, (17) 

for every * = 1 ,..., n; and where Zi := (zu ,..., zn) is a row vector of I uncorre¬ 
lated instruments. The strengths of each of these instruments are controlled by 
a column vector of parameters denoted by A := (Ai,..., A;)'. We here assume 
that the Aj’s are held constant such that Xj := A, for every j = 1,..., Z. As for 
Model I, we fix Var)!/) = Var(Ai) = 1, and generate the Zij^s and the b/’s from 
a standard normal distribution, with cr^ := 1 and cr^ := 1, respectively. The for¬ 
mula for the error variance of the second-level equation, is identical to 

the one used in Model I. 

The error variance for the first-level equation, denoted by ag, is also con¬ 
trolled by a parameter 7 , which is here defined to be the coefficient of multiple 
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Estimators' RMSE 


a = 0 a = 0.25 a = 0.5 
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Sample Size 

Fig 3. Monte Carlo estimates of the root mean squared errors (RMSEs) of the three 
estimators of interest under the simulation scenarios described in figure 2 . As predicted, 
the RMSE of the proposed CLS method strikes a trade-off between its two constituent 
estimators. Indeed, under small a, the CLS’s RMSE tends towards the RMSE of 
the OLS estimator; whereas under large 7 , it tends towards the RMSE of the TSLS 
estimator. 


correlation between each Xi and the i**' vector of Zij^s. As before, using the 
Bienayme formula, the variance of each Xi can be expanded as follows, 

i 

Yai{X,) = YaiiZij) + o? Var(t/i) -b Var((50, 

which simplifies to Var(Xi) = 1)? -b -b cr^, by our choice of variances for 
the Zi^'% and Ui%. When specifying Var(Ali) = 1, this gives cr|(a, 7 ,/) = 1 — 
(1}? -b a^), and moreover, when enforcing the positiveness of a^, we obtain A < 
ya — 0 ^) 11 . Next, if we choose A := then the parameter 7 can be seen 

to correspond to the multiple correlation coefficient between each Xi and the 
vector of Zij's. Indeed, we have 7 ^ = r^R~^r, in which r := {rxz, ■ ■ ■ ,?’£cz)^ 
with Vxz ■= CoT{Xi, Zij) = 7 ; and where R is the correlation matrix of the 
vector of Z^-’s, such that Rat ■= Goi {Zia, Zn,), for every a,b = 1,...,L 
Thus, as in Model I, we again obtain the upper bound, 7 < Vl — as well as 

= 1 - (7^ 
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Fig 4. Monte Carlo estimates of the absolute value of the bias of the three estimators 
of interest, under the simulation scenarios described in figure 2. Observe that the three 
estimators exhibit no bias, when no confounding is present. That is, the OLS estimator 
exhibits less bias, when a = Cor{Xi,Ui) is low. Also, note that the finite-sample bias 
of the TSLS estimator tends to diminish with large sample sizes. This behavior is 
especially visible for large a’s. 


4-3. Monte Carlo Summary Statistics 

We evaluate the finite-sample performance of the estimators of interest by com¬ 
paring the Monte Carlo estimates of three different population statistics. For 
every candidate estimator, /Jl, its Monte Carlo distribution is given by the 
following empirical distribution function (EDF), F{b) := T~^ ^ 

where I{ft} denotes the indicator function taking a value of one, if ft is true, 
and zero otherwise. For each simulation scenario, we draw T := 10^ realizations 
from the two models described in sections 4.1 and 4.2. 

Using these simulated samples, we compute the Monte Carlo estimates of 
the bias, variance, and MSE; denoted by Bias^(,9^), Varp(/3l), and MSEp(/3l), 
respectively. In figure 2, we have reported the Monte Carlo distribution of the 
three estimators of interest under T simulations from model I; whereas in figures 
3, 4, and 5 we have reported the Monte Carlo MSE, squared bias, and variance, 
respectively. The quantities in these three figures have been square-rooted in 
order to facilitate the comparison of these statistics with the estimators’ values 
in figure 2. Similarly, in figure 6, we have reported the Monte Carlo distributions 
of the TSLS, JIVE, as well as their CLS counterparts; with the Monte Carlo 
estimates of their absolute value bias being described in figure 7. 


18 





















<CQ. 

"o 

c 

g 

■> 

0 

Q 

■p 

rO 

T3 

c 

iS 

05 


Estimators’ Standard Deviation 



Sample Size 


Estimators 

OLS 

TSLS 

CLS 


Fig 5. Monte Carlo standard deviation of the three estimators nnder scrntiny, under 
the scenarios described in figure 2. By corollary 1, the variance of the OLS estimator is 
always smaller than its competitors, as verified in these simulations. Moreover, observe 
that the variance of the TSLS estimator increases as the strength of the instrument, 
7 = Cor(Xi, Zi), decreases. 


4-.4- Results for Model I (Single Instrument) 

The behavior of the CLS was found to be mainly controlled by the strength 
of the instrument, Z. When the instrument was strongly correlated with the 
predictor X -that is, for large values of 7 = Cor(Xi, Zi); the values of the CLS 
estimator were close to the ones of the TSLS estimators, as can be observed in 
the last row of figure 2. By contrast, when the instrument was weak -that is, 
for small values of 7 , the values of the CLS estimator were closer to the ones of 
the OLS estimator, as can be seen in the first row of figure 2. 

Proposition 3 stated that the MSEs of the OLS and TSLS estimators are 
bounded below by the MSE of the CLS estimator when the true proportion 7 r„ is 
known. These Monte Carlo simulations appear to support a partial analog of this 
result when 7 r„ is evaluated from the data. Indeed, on one hand, figure 3 shows 
that the MSE of the OLS estimator tends to be smaller than the MSE of the CLS 
estimator, when no confounding is present; thereby showing that proposition 3 
does not strictly hold when 7 r„ is estimated from the data. However, on the other 
hand, one can also observe from figure 3 that the Monte Carlo MSE of the CLS 
estimator is smaller than or equal to the one of the TSLS estimator under all 
considered scenarios. Thus, it seems that a weaker version of proposition 3 may 
hold for estimated 7r„, which would solely pertain to a comparison between the 
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Fig 6. Comparison of the TSLS and JIVE estimators with their CLS counterparts, 
using Z = 10 uncorrelated instruments, whose multiple correlation with the outcome 
is given by 7 , and with a measuring the strength of bias, as in Model I. Data have 
been simulated using Model II from section 4.2. The bootstrap estimate of 7 r„ for the 
CLS-JIVE is based on 13 = 100 resamples, as described in section 3.2 on bootstrap 
CLS estimation. All scenarios have been repeated over 10® Monte Carlo iterations. 


behavior of the CLS and TSLS estimators. In particular, observe that for strong 
instruments (i.e. for large values of 7 ), the CLS estimator behave as well as the 
TSLS estimator, whereas for weak instruments (i.e. small values of 7 ), the CLS 
estimator outperforms the TSLS estimator. 

The behavior of these estimators can be better understood by separately 
considering their bias and variance. In figures 4 and 5, we have respectively 
reported the Monte Carlo estimates of the bias and variance of the OLS, TSLS 
and CLS estimators. Naturally, the bias of the three estimators tends to increase 
with the strength of the confounder, which is controlled by a = CoT{Xi,Ui). 
In particular, the bias of the OLS estimator becomes larges as a increases. By 
contrast, the bias of the TSLS estimator remains small for every value of a. In 
fact, as stated in Corollary l(i), the bias of the OLS estimator is bounded from 
below by the bias of the TSLS estimator. Moreover, the finite-sample bias of the 
TSLS estimator can also be observed to decrease as the sample size increases. 
As predicted, the bias of the CLS estimator is comprised between the ones of 
the two other estimators; and the bias of the CLS estimator approaches the one 
of the TSLS estimator, as the strength of the instrument increases. 

Figure 5 describes the behavior of the variance of the estimators of interest 
under our various simulation scenarios. The variance of the three estimators 
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Fig 7. Monte Carlo estimates of the absolute value of the bias of four estimators of 
interest for Model II with I = 10 uncorrelated instruments, whose multiple correlation 
with the outcome is given by 7 , and with the strength of the bias being denoted by a. 
As in figure 6 , all scenarios have been repeated over 10® Monte Carlo iterations. 


tends to decrease as the sample size increases. This downward trend is especially 
noticeable for the TSLS estimator, which exhibits a high level of variability, when 
the instrument is weak (i.e. for small values of 7 ). As predicted by Corollary l(ii), 
the variance of the TSLS estimator can be observed to be bounded from below 
by the variance of the OLS estimator. In the presence of weak instruments, 
the CLS estimator’s variance is close to the one of the OLS estimator. As 7 
increases, however, the variance of the CLS estimator converges to the one of 
the TSLS estimator. 

4-5. Results for Model II (Multiple Instruments) 

Our second set of simulations aimed to assess whether the use of an estimator 
possessing better finite-sample properties could be incorporated into the CLS 
framework. In figure 4, we have already seen that the TSLS estimator suffers 
from a substantial finite-sample bias. This was found to be especially the case 
when the instruments of interest are comparatively weak, and the bias is large. 
In particular, previous authors have shown that the TSLS estimator’s bias tends 
to be especially large, when several instruments are used (Angrist et ah, 1995). 
This limitation of the TSLS estimator has been addressed in the literature by 
the introduction of the JIVE, which was described in section 3.1. This second 
set of simulations is thus based on ^ = 10 uncorrelated instruments, and allow us 
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to compare the relative merits of using either the TSLS estimator or the JIVE 
within the CLS framework. Consequently, we will refer to using the TSLS and 
using the JIVE as unbiased estimators within the CLS, as the CLS-TSLS and 
CLS-JIVE estimators, respectively. 

As predicted, the JIVE performs better than the TSLS estimator, when used 
in conjunction with strong instruments, and in the presence of a substantial 
amount of confounding (i.e. a > 0.25), as can be seen from figures 6 and 7. 
Note, however, that for weak instruments (i.e. when the multiple correlation 
coefficient is 7 = 0.1), the JIVE’s variance is very large. The TSLS estimator 
should therefore be favored under these scenarios, if one’s choice of estimator is 
motivated by a desire to minimize the MSE. 

The benefits of using the JIVE translate into corresponding improvements 
when using the CLS-JIVE. This relationship is especially visible when consid¬ 
ering the bottom right panel of figure 7. Under a set of strong instruments 
(i.e. with a large multiple correlation coefficient 7 ), and under a substantial 
amount of confounding (i.e. large a); one can observe that the JIVE has a 
smaller finite-sample bias than the TSLS estimator. Similarly, under the same 
scenario, the CLS-JIVE has a correspondingly smaller finite-sample bias than 
the CLS-TSLS estimator. This improvement in the CLS-JIVE was particularly 
remarkable, because the proportion parameter, 7 f„, was estimated using only 
B = 100 bootstrap samples. Thus, it appears that a relatively small number of 
resampling is sufficient to produce a CLS-JIVE estimator that outperforms the 
CLS-TSLS estimator. One may therefore conjecture that the CLS framework 
could be used in conjunction with other asymptotically unbiased estimators, 
even when the proportion parameter is not available in closed-form. 

5. Applications to Econometrics 

Our proposed methods have been applied to a re-analysis of a classic data set in 
econometrics, originally published by Angrist and Krueger (1991), which aimed 
to relate educational attainment with earnings. This particular study has been 
the subject of numerous replications and re-analysis, and therefore provides us 
with a well-known example for evaluating the performance of CLS estimation 
in a real-world data set. 

5.1. Quarter-of-Birth as Instrument 

Angrist and Krueger (1991) reported a small but persistent seasonal pattern of 
educational attainment over several decades between the 1920s and the 1950s. 
They observed that two discrepant regulations in the United States during that 
period have led to a ‘natural experiment’, in which individual differences in 
completed years of education could be predicted by an individual’s season of 
birth. On one hand, nationwide school-entry requirements controlled the age at 
which a given child began school. Indeed, at that period in the US, all children 
were expected to reach six years of age by the first of January of their first year 
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Fig 8. Graphical representation of the IV model described in equations (18) and 
(19), composed of a vector of endogenous variables, Xi, and a vector of exogenous 
variables X 2 . This graph corresponds to a two-level system of equations composed of 
Y = Xi/3i -b X 2/32 + Ua + e, and Xi — ZiFi + X 2 T 2 + Ua + 6. This model can be 
seen to be a generalization of the simpler model described in figure 1. 


at school. Thus, children born early in the year were likely to be older than their 
peers in the same class. On the other hand, state-specific compulsory schooling 
laws solely required pupils to remain in school until their sixteenth birthday. 
Therefore, pupils born in the first quarters of the year, wishing to leave school 
early, could do so at an earlier stage than their peers born later in the year. 

These two regulations -school-entry requirements, and compulsory schooling 
laws- therefore conspired to enable children born in the early quarters of the 
year to complete a smaller number of years of education, if they were so in¬ 
clined. Crucially, Angrist and Krueger (1991) highlights that the randomness of 
an individual’s birth date is unlikely to be related to other events in an individ¬ 
ual’s life; thereby precluding quarter-of-birth from being a significant predictor 
of an individual’s revenue later in life. Thus, quarter-of-birth could be argued 
to constitute a legitimate instrument for education attainment, fulfilling the ex¬ 
clusion criterion, in the sense that it is not directly related to earnings. Note, 
however, that some authors have disputed the validity of quarter-of-birth as an 
instrument for education (Bound and Jaeger, 1996). 

5.2. Model with Extraneous Covariates 

The model used by Angrist and Krueger (1991) generalizes the IV model de¬ 
scribed in sections 2.1 and 2.2. Here, the vector of k predictors, X, is partitioned 
into ki endogenous variables denoted by Ai, and ^2 exogenous variables denoted 
by X 2 ■ In Angrist and Krueger’s model, the sole endogenous variable of interest 
is the completed years of education of each subject. Therefore, we have fci = 1. 
The outcome variable, Y, which represents the log-transformed weekly wage in 
dollars of each subject, is then modelled as follows. 


Y = Ai/3i -b A 2/32 + e, 


(18) 
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Table 1 

Replications and extensions of the IV analysis in Angrist and Krueger (1991). 


Covariates (X 2 ) 

OLS 

TSLS 

JIVE 

CLS-TSLS 

A.« 

Estimator: 
Std. Error: 
Proportion: 

0.0802 

(0.0004) 

0.0769 

(0.0150) 

0.0755 

(0.0210) 

0.0800 

(0.0126) 

(0.95) 

B.t’ 

Estimator: 
Std. Error: 
Proportion: 

0.0802 

(0.0004) 

0.1398 

(0.0334) 

-0.1276 

(1.7233) 

0.1254 

(0.0317) 

(0.24) 

C.= 

Estimator: 
Std. Error: 
Proportion: 

0.0701 

(0.0004) 

0.0669 

(0.0151) 

0.0650 

(0.0234) 

0.0700 

(0.0097) 

(0.96) 

D.a 

Estimator: 
Std. Error: 
Proportion: 

0.0701 

(0.0004) 

0.1065 

(0.0334) 

0.0224 

(2.1380) 

0.0899 

(0.0235) 

(0.46) 


This only includes ten dummies for the years of birth. 

This includes years of birth, and age with the exclusion of the 1929 
dummy. 

^ Covariates include years of birth, and some extraneous covariates de¬ 
scribed in the text. 

Covariates are years of birth, age, and other extraneous covariates, with 
the exclusion of the 1929 dummy. 


where /3i and (32 are column vectors of dimension ki and k 2 , respectively. The 
endogeneity of Xi leads to the use of a row vector of instruments, Zi, of dimen¬ 
sion 1 X Zi, here denoting a set of dummy variables for the interactions between 
quarters and years of birth. These instruments are combined with the ^2 exoge¬ 
nous variables from the second-level equation in order to produce the following 
first-level equation, 

X = ZiTi + X 2 T 2 + S; (19) 

where Fi and r 2 are vectors of parameters of order lixk and ^2 x k, respectively; 
where k := ki + k2. A graphical representation of this model is given in figure 

8 . 

In matrix notation, given a sample of n subjects, this model can be expressed 
with respect to an n-dimensional column vector of error terms, e, for the first- 
level equation; and a matrix, D, of error terms of order nxki for the second-level 
equation. Altogether, we thus have the following linear system, 

y = XiA + X2/32 + £, 

x = Ziri + X2r2 + D. 


Moreover, we can construct the following block matrices, X := [Xi X2] and 
Z := [Zi X2] that are of order n x k and n x I, respectively; in which we have 
used k •.= ki + ^2 and I := h + k 2 . In addition, we also dehne the vectors 
of parameters (3 := [0^ /3j]^ and F := [Ff F|’]^, of order A: x 1 and I x k, 
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respectively. Equipped with these block matrices, we can immediately recover 
the standard TSLS estimator formula described in section 2.2. It also follows 
that this model is well-identified whenever li > ki is satisfied, as in the model 
at hand. 

5.3. Results of the Re-analysis 

The results described in table 4 of Angrist and Krueger (1991) have been repli¬ 
cated and extended. In this portion of their analysis, Angrist and Krueger have 
considered the cohort of men born between 1920 and 1929. This constitutes a 
sample of n = 247,199 subjects. All data are here based on the 1970 US census. 
Using the notation introduced in equations (18) and (19), the outcome variable, 
Y, is defined as the mean log-transformed weekly wages; the endogenous vari¬ 
able, Xi, is the number of completed years of education; and the instrument, Zi, 
is composed of a vector of interaction terms between quarter-of-year dummies 
and year-of-birth dummies, totalling 40 different instruments. 

In addition, the authors have also considered different sets of exogenous co¬ 
variates, denoted by in equations (18) and (19). The choice of exogenous 
covariates has been reported as covariate scenarios A to D in table 1. All sce¬ 
narios include ten dummy variables for each year of birth, except scenarios B 
and D, in which the 1929 dummy variable has been removed due to multi- 
collinearity, following Angrist and Krueger (1991). In scenarios B and D, this is 
supplemented by an age covariate. (Note that we have not included age squared 
in this analysis as was conducted in Angrist and Krueger (1991), since age and 
age squared were found to be almost perfected correlated.) Finally, scenarios 
C and D include some further dummy variables for race, marital status, eight 
different regions of residence, and whether or not the subjects were primarily 
located in a standard metropolitan statistical area (SMSA). 

For scenarios A and C, the OLS and TSLS columns in table 1 are exact 
replicates of the results described in Angrist and Krueger (1991). The values 
and standard errors for these estimators are slightly different under scenarios 
B and D, due to the non-inclusion of age squared in the present analysis. The 
variance of the CLS estimator was computed using the bootstrap, as described 
in section 2.6. As expected, one can observe that the CLS strikes a balance 
between the OLS and the TSLS estimators, such that for all four scenarios, the 
value of the CLS is comprised between the one of the OLS and the one of the 
TSLS. The value taken by the estimate of the proportion parameter has also 
been reported. By comparing scenarios A and C with scenarios B and D in table 
1 , it can be seen that the inclusion (resp. non-inclusion) of the age variable leads 
to a decrease (resp. increase) in the value of 

Thus, this re-analysis suggests that while the use of quarter-of-birth as an 
IV for education may be justified when age is included as a supplementary 
exogenous variable in the analysis; it appears that an estimator closer to the 
OLS is sufficient, when the age variable is not included. The JIVE and its CLS 
counterpart have also been reported for comparison. The behavior of these two 
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estimators is comparable to the one of the TSLS and CLS-TSLS estimators. 
Note that for computational convenience, the variances of the CLS estimators 
and the proportion parameter of the CLS-JIVE have been here estimated using 
bootstraps based on solely 10^ resamples. This real data analysis thus demon¬ 
strates that a small number of bootstrap samples suffices to produce reasonable 
estimates of the standard errors of the CLS estimators. 

6. Conclusions 

In this paper, we have shown that different IV and non-IV estimators can be the 
object of convex combinations, and the proportion parameters of these combi¬ 
nations can be consistently estimated from the data. Such CLS estimators are 
therefore particularly attractive, since they automatically down-weight the in¬ 
fluence of weak instruments, when these are not expected to lead to a large 
reduction in bias. Moreover, this inferential framework bears some similarities 
with the Hausmann test. Theoretically, our proposed estimator minimizes an 
empirical MSE over a restricted class of estimators, consisting of all the possible 
convex combinations of the OLS and TSLS estimators. We have also seen the 
TSLS estimator in the definition of the CLS can be replaced by other estimators 
such as the JIVE. 

For finite n, the moments of the TSLS estimator and other /c-class estimators 
need not exist, as demonstrated by Kinal (1980). It is common in such situations 
to assume that at least three instruments are present. This condition ensures 
that the hrst two moments of the estimators under scrutiny exist; and the hrst 
two moments of the OLS and TSLS estimators are needed to compute our 
proposed empirical estimator of the MSE. However, note that, asymptotically, 
all moments exist and that, theoretically, this strategy can thus be applied to any 
number of instruments. Indeed, irrespective of the number of instruments used, 
every CLS estimator is guaranteed to be asymptotically consistent. Observe 
that a similar issue arises for all IV models, in the sense that such models are 
only asymptotically identifiable. In the present case, our proposed estimation 
procedure only possesses asymptotic first and second moments. However, just 
as hnite-sample non-identifiability is not a point of concern for the use of IV 
methods in practice, the reliance of the CLS framework on asymptotic moments 
does not constitute a significant hindrance to the general application of this 
method. 

The interpretation of the combination of several estimators such as the OLS 
and TSLS estimators relies on the assumption of effect homogeneity [REFs 
needed]. That is, we are assuming that the causal effect is identical for all sub¬ 
populations. This is a strong assumption, which needs not hold in practice. 
Thus, further research will be needed to clarify the assumptions required for 
employing the CLS, when effect homogeneity is not expected to hold. Indeed, in 
such cases, the OLS and the TSLS estimators may represent the local treatment 
effects in different subpopulations. Therefore, the resulting convex combination 
of such estimators may be difficult to interpret. Additional assumptions may 
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be required to ensure that the different estimators of interest are sufficiently 
comparable to be combined in this fashion. 

Observe that the estimators utilized to produce the CLS estimator do not 
need to share the same data. Indeed, when constructing a combination of the 
OLS and TSLS estimators, only the TSLS estimator relies on the instrument, 
Z. Thus, one may also consider how such a framework could be extended to 
other modelling strategies, such as mixed-effects models for longitudinal models 
(see Wooldridge, 2002). Similarly, this method could also be extended to models 
including measurement errors. Calibrated regression is often used in conjunc¬ 
tion with explanatory variables, in order to diminish the effect of measurement 
error. In such cases, the resulting combination utilizes estimators based on dis¬ 
tinct data sets. In addition, recall that of the theoretical results derived in this 
paper are relying one the assumption that the instruments under scrutiny are 
valid, in the sense that (A4) is assumed to hold. The consequences of relaxing 
this assumption are difficult to anticipate, and further research should certain 
consider such situations, as has been done by previous authors in the case of 
the TSLS estimator (Jackson and Swanson, In press). 

Thus far, we have combined the OLS estimator with either the TSLS estima¬ 
tor or the JIVE. Observe, however, that we are not restricted to choosing the 
OLS as a reference estimator. Within the bootstrap CLS framework described 
in section 3.2, one could also choose to combine the TSLS and the JIVE, for 
instance. In general, any pair of estimators could be the object of a convex 
combination. For such a combination to be useful, it suffices that these estima¬ 
tors are ordered in terms of bias and variance, as in the canonical case of the 
OLS and TSLS estimators given in proposition 1. Another natural theoretical 
extension of the current work would be to derive a central limit theorem for the 
CLS estimator. This would allow researchers to obtain approximate conhdence 
intervals for the CLS estimator, using normal asymptotic theory. Such a central 
limit theorem would also enable the construction of adequate statistical tests for 
evaluating whether or not the values of individual parameters are statistically 
significant. Such extensions are not expected to be too arduous, since under the 
assumptions stated in this paper, the CLS estimator is consistent; and moreover, 
estimators such as the TSLS or JIVE are known to be asymptotically normally 
distributed under standard assumptions. 

Appendix A: Proofs of Propositions 

Proof of Proposition 1. The proof of (i) immediately follows from the definition 
of the empirical bias in equation (8), which implies that the empirical bias of the 
TSLS estimator is identically zero, for every realization. The proof of (ii) can 
be conducted in two steps. Firstly, one can show that for every pair of matrices 
X and X := H^X, we have 

X'X ^ X'X. (20) 

Observe that we have the following equivalence due to the symmetry of H^, 

X'X = (H^X)'X = X'H^X = X'X. (21) 
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Secondly, the inner product of X can also be simplified using the idempotency 
of Hz, such that 

X'X = X'HzX = X'HzHzX = X'X. (22) 

Then, expanding the dot product of X — X, and applying equalities (21) and 
(22), we obtain 

(X - X)'(X - X) = X'X - 2X'X + X'X = X'X - X'X. 

Observe that the dot product, (X —X)'(X —X), is a Gram matrix, and therefore 
it is necessarily positive semi-definite. Consequently, this implies that 

X'X - X'X = (X - X)'(X - X) ^ 0, 

and hence X'X ^ X'X, by the definition of the positive semidefinite order. 

Next, observe that the estimates of the error variances under the OLS and 
TSLS estimation procedures are defined as 

in-k)al := (y - x3„)'(y - x3„) 

< (y - X^„)'(y - X^„) =: (n - k)al 

where the inequality follows from the optimality of the OLS; and therefore, 

u2(X'X)-i ^ al{X'X)-\ 

since assumption (A6) guarantees that both sides are invertible. □ 

Proof of Corollary 1. For inequality (i), observe that the empirical bias of the 
TSLS estimator is identically zero for every realization and every n. Moreover, 
by equation 8, the empirical bias of the OLS estimator is given by 

Bias^(/3„) = - /9n)(3n - /9n)' ^ 0 , 

since the LHS is a Gram matrix, and is therefore positive semidefinite for every 
realization. Since it holds for every n, this positive semidefiniteness is preserved 
in the limit. Moreover, the inequality in (ii) immediately follows from the fact 
that the variances of these two estimators converge to the zero matrix. □ 

Proof of Proposition 2. The optimal value of 7r„ can be found by minimizing 
the criterion of interest, which will be denoted by /(tt) := MSE(^„(7r)). For 
expediency, we will expand this criterion as was done in equation (10) such that 

tr /(tt) = tr(7r^Mi -|- 27r(l — 7r)G + (1 — 7:)“^M 2 ), 

with Ml := MSE(/3„), C := CSE(/3„,/3„), and M 2 := MSE(/3„); and where 
recall that /3„ and /3„ denote the OLS and TSLS estimators, respectively. Since 
the derivative is a linear operator, it commutes with the trace, and we obtain 

tr(i9//i97r) = 27rtr(M2 — 2G -I- Mi) — 2tr(M2 — C), 
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Setting this expression to zero, yields 7r„ := tr(M 2 — C)/tr(M 2 — 2C + Mi), 
as required. Naturally, this minimization holds for every choice of n, thereby 
proving the first part of proposition 2. 

In addition, one can show that this minimizer is unique by performing a 
second derivative test, such that we obtain 

tr(5V/97r2) = 2tr(Mi - 2C' + M 2 ). (23) 

Since by assumption, the random vectors, /3„ and /3„, are elementwise squared- 
integrable, the components, E[(/3„j —/3j)^], of Mi are finite for every j = 1,... ,k. 
Hence, using the linearity of the trace, the MSE of (3n can be treated as a sum 
of real numbers, thereby yielding the L^-norm on such that 

/ fc ^ 1/2 

(trMSE(3„))'/"= ( ) =:||3n-/3||. 

The latter quantity will be referred to as the root (trace) MSE of /3„, and will 
be denoted by RMSE. 

By the same reasoning, it can be shown that C and M 2 corresponds to the 
inner product, {f3n —f3, (3n —(^), and the squared norm, ||/3„ —/3|p, onK^. Thus, 
equation (23) can now be re-expressed as follows, 

tr{d^f/dTT^) = 2 ( 1 13„ - /3|P - 2(3„ + ||^„ - /3||2). 

The Cauchy-Schwarz inequality can here be invoked to produce an upper bound 
on the cross-term in the latter equation, 

(3„ - /3 Jn - /3) < ||3n - /3|| • -/3||. 

It then suffices to complete the square in order to obtain the following lower 
bound, 

tr(5V/57r2) > 2(||3n - /3|| - ll^« -/3||) > 0, 

for every n, and where equality only holds when the RMSEs of /3„ and /3„ are 
identical, as required. □ 

Proof of Proposition 4- Firstly, observe that the stochastic convergence of f3„{Tr) 
to 7r/3 -I- (1 — 7r)/3 is immediate from the convergence of /3„ and /3„ to their 
respective limits, f3 and /3. Naturally, this holds for every tt € [0,1]. 

Secondly, recall that the empirical MSE of /9n(7r) can be decomposed into a 
variance and a bias term, as in equation (12), such that for every tt, we have 

MSEO„(7r)) = VarO„(7r)) -t Bias^(/3„(7r)). 

The variances of both the OLS and TSLS estimators are known to converge to 
zero. Moreover, the bias of the TSLS estimator is also known to converge to zero, 
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and this can be seen to be also true for the cross-bias term. Thus, the stochastic 
limit of MSE(/3„(7r)) reduces to the weighted limit of the empirical bias of the 
OLS estimator, 7r^]Bias^(^„). Using the consistency of the TSLS estimator, this 
latter term satisfies 

plimBias(^„) = plim(^„ - /3„) = Bias(/3) = (/3 - /3), 

n—>-oo n—¥(X) 

since /3„ /3, and (3 = lim„ which is the true bias of the OLS estimator. 

The result then follows by using the continuous mapping theorem. □ 

Proof of Corollary 2. The minimization of the empirical MSE follows from the 
arguments used in the first part of proposition 2, and simplifying the closed- 
form formula for 7f„ using our adopted definitions for the empirical bias of the 
TSLS estimator. □ 

Proof of Proposition 3. This inequality relates the theoretical MSEs of the CLS, 
OLS and TSLS estimators. This result follows from the convexity of the MSE 
with respect to its argument, /3^; in which represents any candidate estima¬ 
tor. The trace of the MSE can indeed be seen to be a convex quadratic form, 
x'Ax, where A is here the identity matrix and x := /3l — (3. That is, for every 
estimator, /3f, let 

f{(3i - (3) := tr ((/jt - 

using the cyclic property of the trace, which shows that this quadratic form is 
convex. Thus, for every two estimators, /3„ and /3„, and every tt e [0,1]; using 
the fact that /3 can always be expressed as 7r/3 -I- (1 — 7r)/3, we have 

f(n^n + (1 - 7r)^„ ~ ^ f(y^0n - /3) + {l- 7r)(^„ - f3)^ 

<7r/(3n-/3) +{1-Tr)f(pn- f3y 

Since by definition, '■= T^f3n + (1 — 7r)/3„, and moreover tr(MSE(/3l)) = 

E[/(/3l)] for every /3l, it then follows that, using the linearity of the expectation, 

tr MSE(;9„(7r)) < 7rtrMSE(/3„) -|- (1 — tt) trMSE(/3„). 

Finally, observing that the set of convex combinations of the form, 7r/(/3„)-|-(l — 
7r)/(/3„), necessarily includes the endpoints of the corresponding line segment, 
we hence obtain 

min ^trMSE(,9„(7r))^ < min ^7rtrMSE(/3„) -|- (1 — tt) trMSE(/3„)^ 

< min I tr MSE(/3„), tr MSE(/3„)|, 


as required. 


□ 
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Proof of Proposition 5. We use a sandwich argument to prove that the CLS 
estimator is MSE consistent, as stated in (ii). For every p € [0,1], let 

Tn{p) ■■= trMSE(/3„(p)) - trMSE(/3„(p)). 

By proposition 4, we know that Tn{p) —^ 0, for every p. However, the quantity 
of interest in the case of proposition 5 can be defined as follows, 

T*{p) ■■= trMSE(/3„(p)) - trMSE(/3„(7r)); 

with p being chosen as 7f„, and where tt denotes the true proportion minimizing 
the theoretical MSE. Firstly, observe that we can find a lower bound for Tfiwn) 
with respect to Tn{p) by a judicious choice of p. That is, 

Tni^n) = trMSE(^„(??„)) - trMSE(^„(??„)) 

< trMSE(^„(7f„)) - trMSE(^„(7r)) = 

since, by definition, tt := minp trMSE(/3(p)). Secondly, one can also derive an 
upper bound for T* (??„), as follows, 

Tn{T^) = trMSE(/3„(7f„)) - trMSE(/3„(7r)) 

< trMSE(/3„(7r))) - trMSE(^„(7r)) = T„(7r), 

since 7f„ := min^ tr MSE(/3(p)). Therefore, we obtain the following sandwich 
inequality, 

Tni^n) < T*i^n) < T„{Tr), 
which could be re-expressed as follows, 

|7’*(7f„)| < max{|T„(7f„)|, |T„(7r)|} 0, 

where the weak convergence to zero was stated in proposition 4. Thus, we have 
demonstrated that 

mintrMSE(^„(p)) mintrMSE(,9„(p)). 

p p 

By proposition 3, we can use the MSEs of the OLS and TSLS estimators as 
upper bound for the RHS of the latter equation such that 

mintrMSE(^„(7r)) < trmin{MSE(/3„), MSE(/3„)} 0, 


since we know that the TSLS is MSE consistent. Therefore, /3„(7f„) —> /3, as 
required. Moreover, the weak consistency of the CLS estimator stated in (i) 
is a direct consequence of its MSE consistency (see, for example, Bain and 
Engelhard!, 1992, p.313). □ 
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Proof of Corollary 3. The proof of this inequality is analogous to the proof of 
proposition 3, in which the theoretical expectation with respect to F, is replaced 
with the bootstrap expectation with respect to F*. Here, the true parameter is 
taken to be E* where is an asymptotically unbiased estimator of interest, 
and /9„(7r) := 7r/3„ + (1 — 7r)/3l is the BCLS estimator. Letting /(x) := x'x, and 
using the convexity of this quadratic form, we obtain 

/(3„W - r[/3t]) < ^/(3„ - E*[/3t]) + (1 - n)f(pi - E*[/3t]). 

for every tt. Moreover, by definition, tr(MSE*(,9„)) = E*[/(/9„)]. Thus, using 
linearity, this gives 

trM^*0„(7r)) < 7rtrM^*(/3„) + (1 - tt) trM^*(/3l,). 

The required inequality then follows by minimizing both sides with respect to 
TT € [0,1], and noticing that every set of convex combinations contains the 
endpoints of the corresponding line segment, as in proposition 3. □ 
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